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We reformulate the Cavity Approximation (CA), a class of algorithms recently introduced for 
improving the Bathe approximation estimates of marginals in graphical models. In our new formu- 
lation, which allows for the treatment of multivalued variables, a further generalization to factor 
graphs with arbitrary order of interaction factors is explicitly carried out, and a message pass- 
ing algorithm that implements the first order correction to the Bethe approximation is described. 
Furthermore we investigate an implementation of the CA for pairwise interactions. In all cases 
considered we could confirm that CA[k] with increasing k provides a sequence of approximations of 
markedly increasing precision. Furthermore in some cases we could also confirm the general expec- 
tation that the approximation of order k, whose computational complexity is 0{N'^'^^) has an error 
that scales as with the size of the system. We discuss the relation between this approach 

and some recent developments in the field. 



I. INTRODUCTION 

The Bethe approximation (BA) is one of the major in- 
gredients leading to the important advances in combina- 
torial optimisation made by the statistical physics com- 
munity in recent years. The starting point of this line 
of research can be traced back to the inclusion the 
Replica-Symmetry-Breaking (RSB) scheme in the con- 
text of the Bethe approximation P, 0] and & ) the appli- 
cation of the method to single instances [Sj] . On the other 
hand the Bethe approximation has become a key issue in 
the context of information theory after it was recognized 
that the well known Belief-Propagation (BP) algorithm 
is tightly related to it [11 . This algorithm was introduced 
in the context of Bayesian networks and has gained in- 
terest after the discovery that the fast decoding of Turbo 
codes and Gallagher codes is indeed an instance of BP 
Q . Currenlty the problem of computing the corrections 
to the BA is attracting incresing attention (see [1, 0, [1| 
for recent literature,) not only for the applications men- 
tioned above but also because the BA is the only way 
of obtaining a mean-field like solution to many unsolved 
physical problems, notably Anderson localization. 

In this work we re-investigate the Cavity Approxima- 
tion (CA), a tool recently introduced in Q to study 
graphical models. The CA is a sequence of approxima- 
tions defined iteratively such that the BA corresponds 
to the zero-th order. Its main features are: i) it can be 
implemented on a given sample (much as the Bethe ap- 
proximation and at variance with the Replica method), 
therefore to each approximation corresponds a BP-like 
algorithm; ii) the expansion at order k (CA[A;]) is correct 
on graphs with k loops, much as the Bethe approxima- 
tion is correct on trees; iii) the computational complexity 
of the corresponding algorithm grows as iV*''+^; iv) when 
averaged over the samples the CA reproduces the results 
of the Replica method, indeed it corresponds to comput- 
ing the I/N'^ corrections within the cavity method. In 
it was argued that the CA is the natural approxima- 
tion scheme on locally tree-like structures, in the sense 



that CA[fc] yields the 0{1/N'^) corrections for models de- 
fined on random graphs. In this paper we confirm this 
expectation by implementing algorithimically the CA, in 
particular we apply CA[0] {i.e. BA), CA[1] and CA[2] to 
instances of graphical models defined on random graphs. 
We conclude that the CA is an efficient tool to improve 
(with polynomial complexity) the BA on this class of 
models that includes notably the error-correcting codes 
mentioned above. We also formulate the theory in a rep- 
resentation that allows for straightforward generalization 
to factor graphs with arbitrary order of the interaction 
factors. Message passing equations for the implemen- 
tation of such a generalization are given explicitly. We 
discuss the relationship between this approach and other 
approaches to go beyond the BA. 



II. THE CAVITY APPROXIMATION: BASIC 
IDEAS 

In the approach was illustrated in the case of bi- 
nary variables with pairwise interactions. In the follow- 
ing, for the sake of completeness, we present the case of 
multivalued variables with generic pairwise interactions 
Hij {xi ,Xj). The same ideas and methods can be applied 
to models with multiple interactions (factor graphs). 

The basic assumption of the BA is that, once a node 
(say ctq in fig. [T]) is removed from the system, the nodes 
that were connected to it (a\,c!2 and (T3) become uncor- 
related. This is true on a tree but it is not true in general 
if loops not shown in fig. [1] are present. From this as- 
sumption one can obtain estimates of local averages of 
the variables. We consider two questions: 

1. How can we estimate the correlation between node 
(72 and (73 when node (Tq is removed from the sys- 
tem? 

2. How can we use these correlations to improve the 
estimates of the local averages? 
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FIG. 1: The marginals of nodes and 1 in the absence of 
link (01) can be expressed in terms of the joint probabilities 
of nodes 0, 4, 5 in the absence of node 1 or of the joint proba- 
bilities of nodes 1, 2, 3 in the absence of node 0. The equality 
of the results yields the cavity equations 

In order to answer these questions local cavity distribu- 
tions are introduced and equations are derived for them. 
The equations will not be sufRcient to compute all the 
cavity distributions and they will be partially estimated 
through a Bethe-like approximation. For each node i we 
define di the neighbors of i and XQi = {xj : j G di}. 
For each node i we consider its cavity distribution, de- 
fined as the distribution P^'^^xgi) of its neighbors in the 
graph, obtained by removing the variable i from the orig- 
inal graph. Note that the knowledge of P'^^\xQi) (the 
Markov blanket of i) is sufficient to determine P{xi,XQi) 
through the formula: 

P{x^,XOi) = CP^^\X9^) Y[ 1pi]{Xi,Xj) (1) 
jedi 

where ^ij{xi,Xj) = exp{~l3Hij{xi, Xj)) and c is a nor- 
malization constant. Now we consider the effect of adding 
to the system without node xq all the interactions but 
"010 • We can express the marginal of site xq in this sys- 
tem in terms of P'^^'' (xqo): 

p(°i)(xo)=cl n P^°\^j)^^Oj{xo,x,) 

+ J2 ^^°^(^90/i) n ^Oj{xo,xM (2) 
{^ao/i} jeao/1 / 

where c is a normalization and we have introduced 
the connected cavity correlation of the set xgo/i, 

e(°\xoo/i) = P(°n^ao/i) - Ujeoo/i P^°\^j)- The same 
object may be calculated starting from the system with- 
out the variable node xi and inserting all interactions 
but 'ipio{xi,xo): 

p(°i)(a:o) = P^'Hxo) + 

T.{x,^ccei/o} ^^^^ (^91/0) U^e^l/0 V'lj (2^1 > Xj) 



where we have introduced another cavity con- 
nected correlation e^^-* (xq, x^i/o) = J^^^^a^Oi 2^ai/o) ~ 
P'''^H^o)P''^H^di/o) and the suffix means that quantities 
are computed in the system without node xi. Equating 
the r.h.s.'s of eq. ([2]) and ([3]), we obtain an equation that 
connects the cavity distributions of neighboring nodes: 

{■■^ao/i} jeda/1 J 

T,{x,,xg,,,} ^'•^HXQ, XQi/o) n»gOl/o i^lj{xi,Xj) 
Sf^i.xavo} ^^^^ (^91/o) n^e91/0 V-lj {Xl , Xj) 

We note that this equation is exact and is valid also if 
some of the nodes connected to xq coincide with those 
connected to xi. We have 2L such equations, two for 
each link (the other equation for link (01) is obtained 
exchanging indices in eq. ^ according to {0 <-> 1,2 
4, 3 ^ 5}). Unfortunately these equations are not suf- 
ficient to determine the full set of cavity distributions, 
which is easily seen noticing that if we knew all the con- 
nected cavity correlations e^'^\xj,XQi/j) and e^^\xQj/i) 
for each fink then the 2L cavity equations should 

be in principle sufficient to determine the remaining 2L 
unknown cavity distributions P^^\xi). The Bethe ap- 
proximation assumes that the variable nodes on the cav- 
ity of node i are uncorrelated in the absence of node i. 
As a consequence the corresponding probability distri- 
butions are factorized and the connected correlations are 
zero {e'^'^\xj,XQi/j) = 0, e^^^xgj/i) — 0) for each link 
(i,j), therefore eq. (|4|) reduces to the standard Belief- 
Propagation equation. 

III. ESTIMATING THE CAVITY 
DISTRIBUTION 

In general, if we have an estimate of P'^^\xQj) for any 
node j we can compute the various connected correla- 
tions in eq. (|4]) and solve the cavity equations obtaining 
a new estimate of P*^*^ {xj ) . In the following we argue that 
to estimate the joint probability distribtution P^^^xgj) 
it is sufficient to have an algorithm {e.g. BP) that esti- 
mates single site marginals P{xi). Indeed suppose that 
we have such an algorithm, then in order to get an esti- 
mate of P^^\xoj) we remove node j from the graph and 
evaluate P^^^ (xj-^ ) through the given algorithm, where 
{xj-^ , . . . , Xjf. } = dj. Then we fix the value of xj-^ and 
compute P^-'H^j2kii) through the same algorithm, and 
so on. In the end the distribution can be reconstructed 
from the formula: 

k 

P^^Hxaj) = P^^Hxn)l[P^'^('^o.\^n,---,^j.-.) (5) 

i=2 
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where k is the number of nodes on the cavity of j. In 
other words, in order to determine P^^\xQj) we have to 
run the approximate algorithm removing site j and fixing 
sequentially the values of XQj. Note that this procedure 
to estimate the cavity connected correlations by sequen- 
tially fixing the values of the cavity spins is easier to 
implement than the use of the Fluctuation-Dissipation- 
Theorem originally proposed in [6] since the latter re- 
quires taking derivatives of the eqs. ([4]). 

Any algorithm may be used to obtain a first estimate 
of P^^\xoj), in particular we can use the BA and obtain 
a new cavity approximation of order 1 (CA[1]). The pro- 
cedure can be iterated yielding CA[fc] (with CA[0]=BA): 

1. write the exact cavity equations for the 
system 

2. for each variable node i: 

(a) remove x^ 

(b) express P^'^^xgi) in terms of 
conditional probabilities through 
eq. dSl) 

(c) use CA [fc — 1] to compute the 
conditional probabilities, compute 

P^'^XQi) and then e^*) 

3. substitute the estimates of e*-*' into the 
exact equations and recompute 

the 2L cavity distributions Pj^\xj). 

In practice the procedure can be implemented through 
a message-passing algorithm whose computational com- 
plexity grows with order k as N'^^^ . 



IV. PERTURBATIVE APPROACH FOR 
PRACTICAL IMPLEMENTATIONS 

We note that the use of eq. ^ requires the application 
of the algorithm CA[fc — 1] a number of times exponen- 
tial in the size of the cavities, therefore it may be con- 
venient to use an approximate expression of eq. ([5]). In 
the following we discuss one such approximation. For a 
given set of nodes A we define the connected correlation 
functions as usual, in particular we have: c{x) = P{x), 
c{x,y) = P{x,y) — P{x)P{y) and so on. The probability 
distribution of a set of nodes A can be written as: 

P{xa)^ X! c{xAi) ■ ■ -c^xaJ (6) 

[-4i,...,^„] 

where [^i, . . . , An] runs over the partitions of A. Under 
some conditions one can assume that P{x) is 0(1) while 
c{x,y) is small, say 0(e) (where e is some small parame- 
ter), c{x, y, z) — O(e^) and so on. For instance in the rep- 
resentation P{xa) oc exp[^. ai(a;i) -I- X]i<i atj{xi,Xj) + 
^i<j<k o-ijkixijXj, Xk) + ■ ■ ] this approximation is valid 



if the interaction terms between k variables are propor- 
tional to e^^^ . As before the connected correlation func- 
tions can be expressed through conditional probabilites, 
i.e. c[x,y) = [P{x\y) — P[x))P[y) and can be deter- 
mined through any algorithm that yields the local distri- 
butions P{x). These observations can be used to reduce 
the number of quantities to be estimated at each cavity, 
in particular, steps 2.b and 2.c can be modified in the 
following way: 

(b) express P^'^^xgi) through eq. ([SP 
assuming that all connected 
correlation functions of more than fc + 1 
nodes vanish. 

(c) use CA[fc — 1] on the corresponding 
system to determine the connected 
correlations through conditional 
probabilities . 

In the following we call CA[fc] the approximation 
scheme that includes the previous assumption. It was 
shown in Q that CA[fc] is exact on graphs with fc loops, 
much as the Bethe approximation is exact on trees. It can 
be argued that this approximation scheme yields the per- 
turbative expansion in powers of 1 /N on models defined 
on random graphs of size N , roughly speaking it means 
that the CA[fc] yields the local marginals with an error 
0(l/iV''"'"^). Indeed in the large N limit random graphs 
are locally tree-like, the loops typically being large. On a 
locally tree-like portion of a random graph the 2-point 
cavity connected correlations are determined by large 
loops and therefore are small; the 3-point cavity corre- 
lations depend on the correlations between these large 
loops and are even smaller, in general we expect that the 
cavity correlations of k nodes yield an effect 0(1/-/V*''~^). 
Therefore in such a region we expect that CA[fc] is really 
a perturbative expansion. On the other hand small loops 
(7 <C InA'^) are definitely present in random graphs, see 
[9| and Ref. therein. The typical graph contains a finite 
number of small loops and in general graphs with a fi- 
nite number of small structures of fc nearby loops have 
probability 0(1/-/V*''~^). Using the exactness of CA[fc] on 
graphs with fc loops mentioned above it can be argued 
that the presence of these small loops does not destroy 
the perturbative nature of the expansion. 

V. GENERALIZATION TO ARBITRARY 
FACTOR GRAPHS. 

The above strategy, which up to now has been re- 
stricted to two-variable interaction models, may be gen- 
eralized surprisingly easily for factor graphs with arbi- 
trary number of variables in each factor. We will write 
down exact equations, as before, for a definition of ep- 
silon functions that corresponds to an expansion around 
totally factorizing cavity distributions, and later we will 
neglect higher order terms. The resulting equations ex- 
plicitly yield a message passing algorithm that takes into 
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account the first order correction (CA[1]) to belief prop- 
agation. In our notation, Roman indices k, . . .) will 
denote variables and Greek indices (a, /?, 7, . . .) denote 
factors. The factor indices are understood to simultane- 
ously represent the subset of Roman indices correspond- 
ing to the variables in the factor. 



A. Exact equations 

The exact equations for the marginal of variable Xi in 
the absence of factor a reads 



Je0\i 



+ 5^ e«(a;a,\„) J] M^ff) 



+- 



i^'\xi,X9^\^)Y{p(.d^\^^!}{xi3) 



(7) 



(8) 



where the expansion parameters are given by 

e«(^aAa) = P^'Hxom)- H HI ^^^'(^0] (9) 

I3edi\a le/3\i 

-P^^\xi)P^^\xe.\^) (10) 

B. Truncated expansion 

In the following, we will assume that ViVai, a2 € di we 
have that ai n 02 = {i}. Up to first order in two-point 
connected correlations, we may write 

e'^'\xi,XQ.\a) ~ X] X] c^^\xi,Xk) 

X n t^^'H^o] n [ n p^'\xm)] (h) 
X n ip^'^M] n [ n ^^'h^.)] 

m£f3\{kd,i) jedi\(a,l3) nei\i 

+ E EE^^^H-.,^o 

!3<-yedi\ake0 lej 

X n [P^Hxm)] n iP^'^i^n)] 



rneP\{i,k) 

n n ip^'^i^r)] 



nG7\(»i') 



(12) 



Let us introduce some notation: 

Ha^iiXi) = E n [P^'\xj)]i;c.iXa) (13) 

Xa^i{Xi) = E E ^''''^iXj,Xk) 
Xa\i j<kea\i 

X Yl [P«(xO]^a(ar„) (14) 

lGa\{i.j ,k) 

p(oc,l3)-.i{Xi) = EE E E C^'\xj,Xk) 

xMXa)Mxi3) n iP^'\^l)] n iP'-'^^rn)] 
lea\{i,j) mel3\{i,k) 

(15) 



^a\i kea\i 

X n iP'-'\xi)]MXa) 
lea\(i,k) 



(16) 



These functions may be interpreted as generalized mes- 
sages, where the i-ia^i{xi) are the familiar ones appearing 
in belief propagation. Putting things together, we may 
write up to first order: 



(17) 



where 



G^'^Xxi) = n |1(}^^{X,) 

fedi\a 

+ E n ^^7^^{Xi) 

I3edi\a ■yedi\{a,l3) 

+ E Pl3,J^i{Xi) Yl f^V^iiXi) 
0<iedi\a riedi\(a,i3,i) 

the true marginals 

P{Xi) = cJ2P^'\xdJ n V^/3(X;3) 

up to first order read 

G(x.) 



(18) 



/'(a;0 



G{xi) = 



(19) 
(20) 
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(21) 



From these equations the pair-interaction case may 
straightforwardly be recovered: the A messages do not 
appear, and the remaining messages are given by 



^pW(xfc)V'a(a;j,a;fc) 

X1pa{Xi,Xk)lpf3{Xi,Xl) 

)?/'Q(xi,a;fc) 



(23) 
(24) 



Note that the messages Xa-^i, P[a.p)^i and Va^i should 
all be "small" compared to Pa^i in absolute sense. If 
they are not, the method is bound to fail in this order 
of approximation. Indeed these messages are not nor- 
malized, contrarily to the P^^\xj). When the "small" 
messages are actually neglected in total, it is easily seen 
that one recovers the belief propagation equations. 



C. Complexity issues 

In the above form, we can nicely distinguish the de- 
pendence of the complexity of the algorithm on factor 
size versus size of the cavities. The computation of the 
messages looks exponential in the quadratic factor size, 
but this cost may be reduced by storing quantities of the 
form 

Mx^,X,) = ^ n [P'-'^i^kMaiXa) (25) 

Using these quantities, the computation time scales 
slightly worse than exponential in the factor size. The 
dependence on the number of factors in a cavity is found 
from equation (|17p . from which it is obvious that this 
dependence is quadratic. 



VI. RESULTS: CONFIRMING SCALING OF 
THE ERROR WITH iV 

In the perturbative nature of the expansion on ran- 
dom graphs has been confirmed by computing the av- 
erage of the energy density in the paramagnetic phase 
of a spin-glass defined on a random graph. In particu- 
lar it has been checked that the first order approximation 
yields the 0{1/N) correction to the energy that was com- 
puted independently through the replica method. 

Currently we report tests of the approximation and the 
corresponding algorithms on specific instances of random 



graphs. We have applied the algorithms C A [0] (i.e. BP), 
CA[1] and CA[2] to systems of binary variables (spin 
ai = ±1, i E {1, . . . , N}) described by the microstate 
probability distribution 



P{a) 



(26) 



The nonzero entries of the matrix form a random 
graph of fixed connectivity equal to three, and we subse- 
(22) qucntly investigated ferromagnetic interactions {Jij — 1 
for all nonzero entries) and spin-glass interactions {Jij = 
±1 with equal probability for all nonzero entries). 

The use of binary variables allows to write the equa- 
tions (|4]) in terms of magnetizations and connected cor- 
relation functions (see 0), as explained in section HVl it 
is assumed that all connected cavity correlations of more 
than fc + 1 spins vanish when applying the CA[fc — 1] al- 
gorithm in the intermediate step of the CA[A:] algorithm. 
In the top figure ^ we report the results for a ferro- 





FIG. 2: Top: ferromagnet on random graph with /3 = 0.3, 
fixed connectivity 3 and 100 samples. Datapoints represent 
averages of the errors of the total energy and mean-squared 
error per link of the estimates of CA[0] (blue, chyan), CA[1] 
(light green, dark green) and CA[2] (red, magenta), as a func- 
tion of the sample size A'', see text. Bottom: same as above, 
for a spin-glass model, but average is taken in the log-domain 
(see text). 
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FIG. 3: Top: Average energy per link as a function of /3 
for an = 120 random ferromagnet as obtained exactly via 
junction tree (magenta), BP or CA[0] (red x), CA[1] with 
response propagation initialisation (blue +), CA[1] initialised 
with the clamping procedure discussed in this paper (green 
o) and CA[2] (only in the paramagnetic regime, black □). 
Bottom: Mean square error of link energy, same colour code as 
top. Note that in the regime approaching the phase transition 
the clamping strategy (green o) seems to perform better than 
the response propagation procedure. 



magnet with Hi = \/i at P = 0.3, corresponding to a 
paramagnetic phase (note that for iV — > oo the critical 
temperature is given by /3c = ^log(3)). We compared 
the various estimates obtained with CA[0], CA[1] and 
CA[2] with the exact result obtained through a junction 
tree algorithm thus we were forced to consider sys- 
tems sizes up to = 120, although the algorithms we 
are considering can be applied to much larger systems. 
For different sizes of the system we plot the average over 
100 random instances of the error of the estimate of the 
total energy and of the average mean-squared error of 
the energy of each link. As expected we see that the 
three algorithms give results of increasing precision, fur- 
thermore we see that the error of the BP (CA[0]) total 
energy scales with the systems size as 1 /N while those of 
CA[1] and CA[2] scale respectively as l/ZV^ and l/N^. 

In the bottom figure ^ we report analogous results 
for a spin-glass again with Hi = at (3 = 0.3 and 
with random interactions Ja = ±1. We note that both 



-15' ' ' ' ' 1 

0.2 0.4 0.6 0.8 1 

beta 




-15' ■ ■ ■ ■ ' 

0.2 0.4 0.6 0.8 1 

beta 

FIG. 4: Model of ferromagnet on a fc = 3 A = 60 ran- 
dom graph with nonzero magnetizations, random normally 
distributed external fields of variance 1 and average 1 (top 
figure) or 0.2 (bottom figure). Circles correspond to log root 
mean square error of link energies for BP (higher, red) ver- 
sus first order corrected BP (lower, blue). Squares denote 
log root mean square errors of single variable averages for BP 
(higher, magenta) and first order corrected BP (lower, black). 
For average external field 0.2 (bottom figure) the first order 
corrected algorithm does not converge for /3 > 0.9. 

cases correspond to a paramagnetic region (the criti- 
cal temperature for the N oo spin-glass is given by 

(3c — I log (^rj)); although the algorithm can be ap- 
plied also in the ferromagnetic region. In the paramag- 
netic region, however, one may exploit the fact that odd 
moments of all (marginal) distributions are zero, signif- 
icantly reducing the complexity of the algorithm. The 
results for the spin glass model naturally display more 
fluctuations than the ones for the ferromagnet, since the 
interaction values are drawn from a distribution, whereas 
for the ferromagnet they are all equal and thus identical 
for each of the 100 instances. Since large deviations dom- 
inate the average errors for small error values, we plotted 
error averages in the log domain for the spin glass, i.e. 
exp(log(A£')). Although the correspondence is less con- 
vincing than for the ferromagnet, the scaling of errors 
roughly follows the same exponents. The deviation from 
this behaviour should disappear for larger N. Details 
of the corresponding equations in the algorithm, using 
linear response, are given in appendix [XI 

The figures ([3]) illustrate the /3-dependence, where in 
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the top figure the total energy is plotted, and in the bot- 
tom the root mean square error of link energies, both 
as a function of /3, for a ferromagnet with N = 120. 
Clearly the CA[1] and CA[2] methods outperform the 
BA in all but a small region around the "phase transi- 
tion" , where correlation lengths diverge and consequently 
the connected correlation terms blow up. Naturally per- 
turbative approaches do not result in improved estimates 
of marginals in this regime, in fact the CA[1] and CA[2] 
methods cease to converge around /? = 0.55. Note that 
the most difficult part is not estimating the connected 
correlations, which is based on the BA (and BP converges 
relatively close to the critical /3), but the adapted update 
equations for P'-^^xj), equation (HI in CA[1] (equation 
(jAip in the appendix), do not converge. In general one 
might be able to optimize an update scheme for these 
equations, which we have not attempted here. 

Apart from problems around the critical value of (3, 
for larger /3 the inversion symmetry in the model is bro- 
ken by the approximation algorithms, that consequently 
disregard the mirror-state free energy valley. In these 
small-scale models, this does affect the results as can be 
seen from figure [H 

When the symmetry of the model is already broken 
by a sufhciently large external field, a situation which 
is common in statistical inference applications, where 
the external fields may originate from Bayesian priors 
or represent evidence from measurement data (see e.g. 

this phenomenon does not occur. The CAil] algo- 
rithm consistently improves the marginal estimates over 
the whole range of /3, as illustrated in figure ID When 
the average external field is relatively small, symmetry 
breaking might again prevent convergence (figure [3] bot- 
tom for (3 > 0.9). 

In the "magnetized" regime of this model, one may do 
a similar scaling analysis as displayed in figured) Results 
are reported in figure [S] for a model with ferromagnetic 
interactions and different values of the external field av- 
erage, where we plot the error of the first order CA[1] 
algorithm as a function of A^. Although the results dis- 
play more fluctuations, the behaviour is similar, in the 
sense that one observes on average a scaling with show- 
ing that indeed as long as the parameters correspond to 
regions not in the vicinity of a phase transition, and the 
correlation lengths remain typically small compared to 
the loop length, the approach is promising. 



VII. DISCUSSION 

The implemented approach is intrinsically perturba- 
tive around the BA, in the sense that the BA gives ac- 
curate results if the correction terms in eq. ^ are small 
and therefore it is natural to guess that CA[1] will pro- 
duce better results. At the same time if the corrections 
turn out not to be small, this hints at poor BP estimates, 
and the whole approach is in trouble (see bottom figure 
[3]). Furthermore we cannot compute CA[1] if BP does 
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FIG. 5: Scaling behaviour for a ferromagnet in a broken sym- 
metry magnetized regime caused by normally distributed ex- 
ternal fields Hi of variance 1 and average 0.5 (top) versus 
1 (bottom). Blue (x): root mean square error in BP cor- 
relations, magenta (v)i root mean square error in BP site 
magnetizations. Red (□): root mean square error in CA[1] 
correlation, green (-I-), root mean square error in CA[1] site 
magnetizations. Black (A): error in CA[1] mean energy per 
link. All data are averages over 100 instances of a random 
graph of connectivity 3. 



not converge. However we recall that any algorithm can 
be used as the starting point CA[0] of the sequence of 
approximations. In conclusion we expect that whenever 
BP converges and yields good estimates, CA[kJ yields a 
series of approximations of increasing precision. In par- 
ticular for graphical models defined on random graphs 
where small loops are rare, CA[k] gives estimates with 
an error 0(1/N^'^^') and with computational complex- 
ity N^'^^ . Note that this last class of models includes 
some of the most important present-date error-correcting 
codes for which the decoding scheme is BP. The reason 
why BP is so efficient in these cases is precisely that in 
the corresponding graphical models small loops are rare. 
Therefore we expect that by the application of CA[A:] the 
marginals can be computed with any precision in poly- 
nomial time. It is important to realize that this does not 
completely solve the problem of the 1/N finite-size effect 
in error-correcting-codes, indeed even if we know the ex- 
act marginals, there is still the possibility that some of 
them are not consistent with the encoded original mes- 
sage. 
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VIII. RELATION WITH OTHER APPROACHES 

The previous comments should help the reader to un- 
derstand what is the natural context of the present ap- 
proach and to clarify the relationship with different ap- 
proaches. A well known generalisation of the BA is 
Kikuchi's cluster variation method (CVM) which is 
particularly suitable for finite-dimensional models and in 
general for models where many small loops are present, 
indeed this approach amounts to treating loops up to a 
certain length exactly. On the other hand on random 
graphs the corrections to the BA are determined not by 
small loops (which are rare) but by many large loops. 
The CVM does not apply to such cases since, in order to 
include the effect of the large loops, the size of the ba- 
sic clusters that it treats exactly should be of the order 
of the total system size, with prohibitive computational 
complexity. On the other hand it is natural to expect 
that CVM performs much better than CA[fc] on graphi- 
cal models defined on structures with many small loops 
like lattices. Thus the cavity approaches are complemen- 
tary to CVM, in the sense that both methods have their 
own well-defined range of applications, although one can 
imagine applications that could be best studied through 
a mixture of them. 

In a recent publication Chertkov and Chernyak (CC) 
Q obtained the free energy of a generic graphical model 
as an expansion around the BA written in terms of di- 
agrams corresponding to sub-graphs with one loop, two 
loops and so on. In spite of their claim that this rep- 
resents an improvement with respect to the approach 
presented here we believe that the two approaches have 
different motivations and capabilities. The present ap- 
proach addresses the problem of improving the computa- 
tion of marginals with polynomial agorithms for models 
defined on random graphs (with error correcting codes 
being a notable example of this class of models) and it is 
as yet not clear if similar results are achievable within the 
CC approach. Indeed we know that the 1/A^ corrections 
computed by CA[1] with N"^ complexity are determined 
by exponentially many large loops (each one yielding a 
small contribution), therefore it seems likely that in or- 
der to obtain results of the quality of CA[1] (i.e. the 1/A^ 
corrections) one should consider all graphs with one loop 
in the CC expansion, yielding an exponential number of 
terms, which is computationally prohibitive unless some 
resummation scheme is supplemented. Recently ([13j) an 
algorithm was tested based on truncation of the series, 
which may work in cases where one is able to identify 
the most important loops that contribute to the BP er- 
ror, when there are not too many. 

In a very recent paper , a number of different algo- 
rithms based on similar ideas as the above have been de- 
scribed, and have been applied to some real-world prob- 
lems. Given an estimate of the cavity distributions, the 
update relations in |14| are based on an adjustment of 
external fields, (keeping higher order interactions in the 
cavity distribution fixed, whereas, we keep the higher 



order connected correlations fixed). Although the con- 
nection with higher order improvements in their scheme 
is lost, the algorithms are sometimes easier to implement 
in the first order case, if the connectivity of the graph is 
not too large. 

Some open problems of the present approach include 
the computation of the corrections to the free energy 
(currently we know only how to improve the marginals, 
therefore we have access only to corrections to local quan- 
tities such as the magnetization and the energy) and the 
extension to the spin-glass phase with the inclusion of 
Replica-Symmetry-Breaking effects. 
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APPENDIX A: CA[1] AND CA[2] UPDATE 
EQUATIONS FOR CONNECTIVITY 3. 



1. CA[1] updates 

The update equations for the first moment Afj 
of a cavity marginal P^^\xj) may be written out in 
terms of connected correlation functions, i.e. m'^^ = 



CfciL = T.x^,xi,x^C^'Kxk,Xl,Xm)xkXlXm, (sCC [6|): 



(■0 _ 

kl ~ 



J2xk.xi'^ i^k,xi)xkXi and 



^ Lij + t{Hi) J2k£di\j ^ik^'jk 
~ K,,+tmY.ked^\,t^kMk^ 



where 



k<ledj\i 



c. 



(i)l 



jfc/J 



(Al) 

(A2) 
(A3) 



k<li£di\j 



The solution of these equations leads to the moment of 
the true marginal P{xi) via 



M,. 



even 



(■0 

odd 



Y.UiM'f^ + E UiUkUmM\'-'>C^ 

{l,k.m.)^di 



''km 



led, 



Ikn 



l<k<medi 



rp(i) ^ 



1+ E uiUkm'^Mi'^ 

k<ledi 



Ik J 



(A4) 
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Correspondingly, the nearest neighbour correlations read 



(A5) 



where 



iedi\] 



iedi\j 



+tm t..t..Mi^'[M,«M«+C«] (A6) 

l<k£di\j 

In the CA[1] approximation, the two-point connected 
correlations are estimated by some algorithm, possibly 
CA[Ol (another option is to use response propagation, 
see Q), and the three-point connected correlations are 
neglected. 



2. CA[2] updates 

The CA[2] algorithm in turn uses improved estimates 
of the two-point connected correlations of which the ac- 
curacy corresponds to CA[1], together with CA[0] (or re- 
sponse propagation) three-point estimates. We used re- 
sponse propagation to compute the CA[1] accurate two- 
point cavity connected correlations. This implies we ex- 
ploit 



C, 



0) 

ik 



U) 



dHk 



(A7) 



but m[^^ is computed with CA[1] accuracy, i.e., from 
equation (jA4p on the graph from which variable j has 
been removed. This may be achieved by simply taking 
the derivative of the right hand side of equation (jA4[) . In 



this expression, we encounter and ^f^- The first 

may be found from the iterative equation resulting from 
taking the derivative of (|A1[) . the second may be esti- 
mated with a CA[0] or response propagation algorithm, 



(i) 
klr, 



since it is of the order of 

In the paramagnetic phase of pair interaction networks 
without external field, simplifications occur since we may 
exploit the fact that odd moments of distributions are 

zero. Consequently, all terms Mi, M^' , and 
vanish and the recursive update relations for the deriva- 
tives of (lAll) reduce to 



DM, 



Aj) 



1 + J2k<redl\j ^lkUrC\.l 



J2k<redj\l ^jktjr 



dM. 



(3) 



dH„ ^Ik 



1 + '^k<r^dj\l ^j^^jr^kr ^.^ 
„ (^J2kedj\l'^jkCii^ 



(A8) 



1 + J2k<r<£dj\l '^jktjrCf: 

The solution of these equations is to be substituted in 



1 + 



k<ledj 



tjktjiCi 



U) 

Ik 



(A9) 



on the graph without variable i, yielding a CA[1] compu- 



tation of C-^^ This results in improved CA[2] estimates 



of correlations via equation (jA6|) which simplifies greatly 
due to the vanishing of odd moments, i.e. 

Xi.Xi 1 + l^k<lGdi ^iktil(-^ki 
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